An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B.
It is used extensively in epidemiology and is defined as the ratio of the odds of A occurring in the presence of B to the odds of A occurring in the absence of B. For example, the odds of death occurring in the presence of smoking vs. the odds of death occurring in the absence of smoking.
In the following tutorial, we’ll calculate the odds ratio using R with a hypothetical dataset.
We’ll assume we have a dataset of patients, some of whom have been exposed to a certain treatment. We’re interested in whether the treatment is associated with recovery.
95.1.1 Step 1: Create the dataset
First, we create a contingency table of treatment exposure and patient recovery.
Show code
# Define the counts of recovery vs. no recovery for both treatment and control groupstreatment_recovered <-60# Patients recovered with treatmenttreatment_not_recovered <-40# Patients not recovered with treatmentcontrol_recovered <-30# Patients recovered without treatmentcontrol_not_recovered <-70# Patients not recovered without treatment# Create a matrix to represent this datadata_matrix <-matrix(c(treatment_recovered, treatment_not_recovered, control_recovered, control_not_recovered),nrow =2, byrow =TRUE,dimnames =list(c("Treatment", "Control"),c("Recovered", "Not_Recovered")))# Look at the matrixdata_matrix
Recovered Not_Recovered
Treatment 60 40
Control 30 70
95.1.2 Step 2: Calculate the odds ratio
We can now calculate the odds ratio.
The odds of recovery for the treatment group is treatment_recovered / treatment_not_recovered, and for the control group, it’s control_recovered / control_not_recovered.
The OR is the ratio of these two odds.
Show code
# Calculate the Odds Ratio manuallytreatment_odds <- treatment_recovered / treatment_not_recoveredcontrol_odds <- control_recovered / control_not_recoveredodds_ratio <- treatment_odds / control_odds# Print the Odds Ratioodds_ratio
[1] 3.5
95.1.3 Step 3: Calculate the Odds Ratio Using a Predefined Function
R has built-in functions to calculate the odds ratio, such as using the fisher.test function for a Fisher’s Exact Test, which is suitable for small sample sizes.
Show code
# Calculate the Odds Ratio using Fisher's Exact Testfisher_result <-fisher.test(data_matrix)# The odds ratio is given in the result, along with the confidence intervalfisher_odds_ratio <- fisher_result$estimateconf_int <- fisher_result$conf.int# Print the resultsfisher_odds_ratio
An OR of 1 suggests no association between the treatment and recovery.
An OR greater than 1 suggests an increased odds of recovery associated with the treatment.
An OR less than 1 suggests a decreased odds of recovery associated with the treatment.
95.1.5 Step 5: Calculate Confidence Intervals
Confidence intervals provide a range of values within which the true odds ratio is expected to lie, with a certain level of confidence (typically 95%).
Show code
# Extracting the confidence interval from the Fisher's Exact Testlower_ci <- conf_int[1]upper_ci <- conf_int[2]# Printing the confidence intervalcat("95% CI for OR: [", lower_ci, ",", upper_ci, "]", "\n")
95% CI for OR: [ 1.872893 , 6.566896 ]
Remember, an odds ratio does not imply causation and should be interpreted with caution, especially with observational data.